Tests of Automatic Annotation Using KOG Proteins and ESTs from 4 Eukariotic Organisms

نویسندگان

  • Maurício de Alvarenga Mudado
  • Estevam Bravo-Neto
  • José Miguel Ortega
چکیده

BLAST homology searches have been largely used to annotate function to novel sequences. Secondary databases like KOG can be used in this intention since their sequences have functional classification. We devised an experiment where public ESTs from four eukariotic organisms, which protein sequences are present in the KOG database, are classified to functional KOG categories using tBLASTn. First we assigned the ESTs from one organism to KTL (KOG, TWOG and LSEs) proteins and then we searched the database depleted of the same organism’s proteins to simulate a novel transcriptome. Data show that classification was correct (assignment equals annotation) 87.2%, 96.8%, 92.0%, 88.7% for A. thaliana(Ath), C. elegans(Cel), D. melanogaster(Dme) and H. sapiens(Hsa) respectively. We have estimated identity cutoffs for all organisms to use with tBLASTn. These cutoffs trim the same amount of events that a BLASTn in order to minimize false positives in consequence of sequence errors. We found values of 80%, 78%, 78% and 84% for amino-acid identity cutoff for Hsa, Dme, Cel and Ath, respectively. We then evaluated our system by comparing the KTL categories of the assigned ESTs with the KTL categories that the ESTs were classified without the organism’s KTL proteins. Moreover, we show the potential of annotation of the KOG database and the ESTs used. Suplementary Information can be found at: http://www.biodados.icb.ufmg.br IV BSB 9 Favor ver os Anais do Simpósio em Springer Verlag, Lecture Notes in Bioinformatics (LNBI número 3594) para este trabalho. Please see the Symposium Proceedings in Lecture Notes in Bioinformatics (LNBI nr. 3594), Springer Verlag, for this paper.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A picture of gene sampling/expression in model organisms using ESTs and KOG proteins.

The expressed sequence tag (EST) is an instrument of gene discovery. When available in large numbers, ESTs may be used to estimate gene expression. We analyzed gene expression by EST sampling, using the KOG database, which includes 24,154 proteins from Arabidopsis thaliana (Ath), 17,101 from Caenorhabditis elegans (Cel), 10,517 from Drosophila melanogaster (Dme), and 26,324 from Homo sapiens (H...

متن کامل

Mining microorganism EST databases in the quest for new proteins.

Microorganisms with large genomes are commonly the subjects of single-round partial sequencing of cDNA, generating expressed sequence tags (ESTs). Usually there is a great distance between gene discovery by EST projects and submission of amino acid sequences to public databases. We analyzed the relationship between available ESTs and protein sequences and used the sequences available in the sec...

متن کامل

Tags Re-ranking Using Multi-level Features in Automatic Image Annotation

Automatic image annotation is a process in which computer systems automatically assign the textual tags related with visual content to a query image. In most cases, inappropriate tags generated by the users as well as the images without any tags among the challenges available in this field have a negative effect on the query's result. In this paper, a new method is presented for automatic image...

متن کامل

Fuzzy Neighbor Voting for Automatic Image Annotation

With quick development of digital images and the availability of imaging tools, massive amounts of images are created. Therefore, efficient management and suitable retrieval, especially by computers, is one of themost challenging fields in image processing. Automatic image annotation (AIA) or refers to attaching words, keywords or comments to an image or to a selected part of it. In this paper,...

متن کامل

Testing the performance of automated annotation of ESTs with the Kegg Orthology (KO) database demonstrates lack of completeness of clusters.

The KEGG Orthology (KO) database was tested as a source for automated annotation of expressed sequence tags (ESTs). We used a control experiment where every EST was assigned to its cognate protein, and an annotation experiment where the ESTs were annotated by proteins from other organisms. Analyzing the results, we could assign classes to the annotation: correct, changed and speculated. The cor...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005